Globally convergent system and method for automated model discovery

ABSTRACT

Methods and systems for model discovery include forming a mathematical program based on a set of observational data to generate an objective function and one or more constraints. The mathematical program represents a model space as an expression tree comprising operators and operands. The mathematical program is solved by optimizing the objective function subject to the one or more constraints to determine a model in the model space that best fits the set of observational data.

BACKGROUND

Technical Field

The present invention relates to automatic model discovery and, moreparticularly, combining data-driven methods with primitives offirst-principles based modeling to generate a mathematical model.

Description of the Related Art

Mathematical modeling provides a consistent link between the input andoutput of a system or phenomenon under investigation. These models areused for various objectives such as description, prediction, and design,through a broad range of science and engineering disciplines, includingphysics, chemistry, biology, economics, etc. Historically, researchersand practitioners find consistent and generalized cause and effect linksthrough experiment and formulate mathematical expressions that embodysuch relations. However, modeling complex phenomena, in particular whilerelying on a relatively small number of observations, is difficult toaccomplish.

Traditionally the formulation of mathematical models is attemptedthrough first principles formulations, using modeling primitives such asdifferential operators and integral equations. Such an approach benefitsfrom great generality in the governing relations, but identification ofan appropriate formulation is non-trivial. Such an approach usuallyrelies on human intuition and the frequently tedious comparison ofpredictions with experimental observations.

At the other extreme, modeling can be performed using data drivenapproaches that use generic statistical models, assuming that asufficient amount of input data is provided. Statistical models may belimited in their generalizability and, depending on the capacity andcomplexity of the predictive relation in the prescribed functional form,may require an intractably large set of examples to use as trainingdata.

SUMMARY

A method for model discovery includes forming a mathematical programbased on a set of observational data to generate an objective functionand one or more constraints. The mathematical program represents a modelspace as an expression tree comprising operators and operands. Themathematical program is solved by optimizing the objective functionsubject to the one or more constraints to determine a model in the modelspace that best fits the set of observational data.

A method for model discovery includes forming a mathematical programbased on a set of observational data to generate an objective functionand at least one numerical and structural constraint represented bycontinuous and integer variables. The mathematical program represents amodel space as an expression tree comprising operators and operands. Theobjective function is based on model complexity and on a fidelity ofmodel predictions to observations. The mathematical program is solved byoptimizing the objective function subject to the one or more constraintsto determine a model in the model space that best fits the set ofobservational data.

A system for model discovery includes a mathematical program generationmodule comprising a processor configured to form a mathematical programbased on a set of observational data to generate an objective functionand one or more constraints. The mathematical program represents a modelspace as an expression tree comprising operators and operands. A solveris configured to solve the mathematical program by optimizing theobjective function subject to the one or more constraints to determine amodel in the model space that best fits the set of observational data.

These and other features and advantages will become apparent from thefollowing detailed description of illustrative embodiments thereof,which is to be read in connection with the accompanying drawings.

BRIEF DESCRIPTION OF THE SEVERAL VIEWS OF THE DRAWINGS

The disclosure will provide details in the following description ofpreferred embodiments with reference to the following figures wherein:

FIG. 1 is a diagram of an expression tree modeling a formula inaccordance with the present principles;

FIG. 2 is a block/flow diagram of a method for automatic model discoveryin accordance with the present principles;

FIG. 3 is a block diagram of a method for automatic model discovery inaccordance with the present principles; and

FIG. 4 is a diagram of an expression tree modeling a Newton's law ofgravitation in accordance with the present principles.

DETAILED DESCRIPTION

Embodiments of the present principles provide a compromise between modelfidelity and model complexity, devising both model functional form aswell as its parameters while honing desired modeling goals. These goalsmay include, e.g., precision, universality, invariance, conservation,functional simplicity/description complexity, symmetry preservation,etc.

Unlike conventional data-driven model generation, which estimatesparameters of a given functional form of a model, and unlike humanfirst-principles modeling, the present embodiments provide automateddiscovery of both the model functional form as well as gauges andcalibrates the parameters of the model. A data-driven approach iscombined with primitives of first-principles-based modeling to provide amore universal model. The present embodiments therefore employstochastic programming and mixed integer non-linear programming, wherethe former provides the statistical framework of the model and thelatter provides simultaneous discovery of the functional form and themodel parameters.

These embodiments provide model learning and discovery using relatively(compared to data-driven approaches) small amounts of data. This is dueto the fact that a more flexible functional form of richer operandprimitives is allowed, along with a stricter complexity requirement. Thepresent embodiments also provide universality. By not restricting themodel to any predefined functional form, and allowing a rich set ofoperators (as opposed to the highly structured and limited setdata-driven conventional approaches), as well as by imposing structuralpreferences, the resulting models possess higher generalizationcapacity. For example, one limitation of data driven approaches is theirpoor ability to extrapolate beyond training data ranges. In addition,the attained models are more compact in their functional representation,making their interpretation more direct and intuitive to providemeaningful links between input and output.

To accomplish the above goals, the present embodiments find the simplestmathematical expression, over a certain alphabet including, e.g., theusual sum, difference, product, division, power, logarithm, exponential,sine, and cosine operators, which yields values which are as close aspossible to a set of observations. This can be described by amathematical program involving both continuous and integer variables, aswell as linear and non-linear constraints.

The discrete integer variables enable description of the functional formof the mathematical model, as they can encode discrete choice betweenone alphabet entity or another at a given entry in the expression tree.The continuous counterparts of the discrete integer variables enablemodeling coefficients and variables holding various continuous values.

A mathematical program is a formal sentence of a language whichdescribes optimization problems. It includes the following entities:parameters (the program input data), decision variables (encoding theprogram output or solution), one or more objective functions, andconstraints upon the decision variables. The objective function andconstraints involve functional forms in terms of the parameters anddecision variables. The task of model discovery is formulated herein asthe construction of the mathematical program.

Referring now to FIG. 1, an exemplary mathematical expression 100 isshown. The mathematical expressions are represented as trees or“expression trees.” These trees are combinatorial constructs includingnodes 102 linked by arcs 104. The nodes are organized by rank, where theroot node has rank zero and any subnode of a parent node has a rank onegreater than the parent node's rank. In this way, the definition of rankis inductively extended over the entire tree. Since rank encodes aparent-child relationship between nodes, a tree cannot have any cycles(where a node might be its own ancestor). The nodes of the tree arelabeled with operator names, such as “plus,” “minus,” etc. In theparticular example of FIG. 1, the tree 100 encodes the mathematicalexpression x+4. The root node 102 is labeled by the ‘+’ operator, theleft subnode 104 is labeled by the ‘x’ variable name, while the rightsubnode 104 is labeled by the constant ‘4’.

Many different expressions are equivalent to this expression 100. Forexample 2(0.5×+2) is equivalent to x+4, which has a lower complexity. Ifthe given data are a good fit for the function x+4, then the expression100 will match the data. There are also multiple different expressionsthat have a minimum description complexity. For example, x+4 has thesame complexity as 4+x. The mathematical program thereby implicitlydescribes all globally optimal expressions.

The number of nodes in the tree is minimized to ensure the globalminimality of description complexity, subject to the fact that the treesare feasible if and only if they encode a mathematical expression whichfits the data. The solution to the mathematical program is globallyoptimal with respect to description complexity, subject to a numericaldiscrepancy not exceeding a prescribed limit.

Referring now to FIG. 4, a diagram of the expression tree for Newton'slaw of gravitation is shown in the tree 400. Newton's law of gravitationis written as

${F = \frac{G( {m_{1}m_{2}} )}{r^{2}}},$

which can be readily represented as a tree of operators (e.g.,multiplication, division, and squaring) and operands (e.g., G, r, m1,and m2).

A number of parameters will be used herein:

D represents the dimension of the Euclidean space embedding theobservations.

H represents a set {H₁, H₂, . . . , H_(max)} of observation indices,each representing an observation point (χ^(h)|h∈H) at which a model ƒ isevaluated and an observation value (ƒ^(h)|h∈H) corresponding to theevaluation off at the observation points.

μ is a metric on

^(|H|) quantifying the discrepancy between measured and modeled outputwith an upper bound constant ε>0, such that μ(f, φ)≦ε, wheref=(ƒ(χ^(h))|h∈H) and φ=(φ^(h)|h∈H).

A represents a set of alphabet symbol indices, where the alphabet{⊕_(a), |a∈A} of the formal language

includes mathematical operations of a given arity K_(a) for each a∈A.Since many nodes can be labeled by the same operator symbol ⊕_(a), thequantity C_(a) is used to denote a maximum number of occurrences of theoperator ⊕_(a) in an expression. Every operator ⊕_(a), aside from leafoperators L={var, coef}, has a given number of K_(a) operands. Anexemplary alphabet A may include the operators coef, var, +, −, *, /,̂2, ̂, exp, log, sin, and cos.

V represents a set {(a, i)|a∈A, i≦C_(a)} of all potential nodes in theproblem.

T represents the expression tree formed from a set of nodes N. A modelfinder makes both topological decisions on the structure of T as well asnumerical decisions about the evaluation of T at given observationpoints. Thus there are two classes of decision variables: numericalvariables v and structural variables w. For every node s∈N andobservation h∈H, v_(sh) holds the value of the evaluation at the nodeand w_(skh) holds the corresponding value at the k^(th) subnode of s.

r_(s) is the tree rank of a node s in V. Although rank is an integervalue, the integrality of r_(s) is enforced by linear constraintsinvolving the binary structural variable σ, defined below. As such,r_(s) can be defined as a continuous variable.

Δ_(g) is the description complexity of the mathematical expression g ofthe expression tree T, defined as Δ_(g)=|N|+|A|, the sum of the numberof nodes and the number of operators. This can be written as a functionof the decision variables as Δ_(g)=Σ_(s∈V) α_(s)+Σ_(s,t∈V) σ_(ts), usingthe binary parameters α and σ defined below. One goal is to minimize thedescription complexity of the discovered model ƒ, and this goal may beaccomplished by, e.g., limiting a number of occurrences of a givenoperator or by limiting a rank of the tree T.

The following structural variables are considered to be binary:

For each s ∈V, α_(s)=1 if and only if s ∈N (i.e., s is part of the treeT).

For each s ∈V, ρ_(s)=1 if and only if s is the root node of T.

For each s ∈V and d≦D, γ_(sd)=1 if and only if s is a variable node(i.e., s=(var, i) for some i≦C_(var) and s is the variable representingthe d^(th) coordinate x_(d). The vector x=(x₁, . . . , x_(D)) representsarguments of the function ƒ(x) to be modeled. The model is learnedthrough multiple instances of χ={χ₁, . . . , χ_(h)}. D is the number ofarguments of ƒ(x), such that the vector x belongs to a space ofdimension D. x_(d) is therefore the d^(th) coordinate of the vector xor, alternatively and equivalently, the d^(th) argument of the functionƒ(x)=ƒ(x₁, . . . , x_(D)).

For each s=(a, i), t=(b, j)∈V and k≦K_(b), β_(stk)=1 if and only if s isthe k^(th) operand of t.

For each s=(a, i), t=(b, j)∈V, σ_(st)=1 if and only if s is an operandof t.

The model is characterized by two sources of error: δ and ε. The formermeasures how well g approximates the unknown function ƒ, while thelatter is a set upper bound for the discrepancy between the modeled andmeasured output. ε is defined by the user, to bound model fidelity(e.g., discrepancy between the true observations and those evaluatedusing the model). In some situations, the user may deliberatelyprescribe specific bounds to satisfy a desired confidence in an end-goalquantity, in other cases the user may determine such bound based upon adesired trade-off between affordable complexity and model error. Inother cases the user may determine such bound based upon the budget ofobservations and the anticipated complexity of the model. The error δhas ε as its upper bound, such that the global optimum for ƒ is definedwith respect to description complexity Δ_(g), with driving δ down as asecondary priority. Provided ε<1, which can be enforced by setting anerror metric μ, and because Δ_(g) can only change by at least one unitat a time (the value difference in any binary decision variable α_(s)),the two objectives can be scalarized to obtain the objective functionmin Σ_(s∈V) α_(s)+δ. μ is a measure of the model output error, used toquantify the pair-wise discrepancy between each data observation of agiven input and the modeled output for the same input. The error modelcan be as simple as least squares difference, or alternatively harnessmore sophisticated statistical error measures if desired. Under theobjective function, δ quantifies how well the model output predictionsmatch the true data measurements, so that the objective functionaccounts for both model fidelity, through δ, as well as modelcomplexity, through α_(s).

The objective function is solved subject to a set of constraints. Theseconstraints may include one or more of numerical value constraints,structural constraints, mixed constraints, and implied constraints. Theconstraints are not limited to those described herein, and it should beunderstood that those having ordinary skill in the art will be able toextend or restrict the set of constraints in a manner appropriate totheir application.

Numerical constraints assign correct values of the nodes of the tree T,so that the v variables will obtain correct values for each input-outputinstance. An example of a numerical constraint is that, if an operatoris a coefficient, its value must be the same over all of the observationpoints.

Structural constraints enforce rules regarding the structure of the treeT. Examples of structural constraints include:

Σ_(s∈V) ρs=1 (the tree T has only one root node);

∀s∈V, ρ_(s)≦α_(s) (a root node is used);

∀q, s, t=(b,j)∈V and k≦K_(b), β_(stk)≦1−ρ_(q) (if the root is a leaf,then g=x_(d) for some d≦D or g is a constant, in which case no arcexists between nodes);

∀s∈V, t=(b,j)∈L, and k≦K_(b), β_(stk)=0 (for leaf operators with nochildren, the corresponding β is zero);

∀s=(a, i)∈V and k≦K_(b), β_(ssk)=0 (the parent-child relationship isirreflexive);

∀s, t=(b,j)∈V, Σ_(k≦K) _(b) β_(stk)≦1 (an operand node is not matched totwo different child nodes);

∀s, t=(b, j)∈V, k≦K_(b) β_(stk)≦α_(s) and ∀s, t(b,j)∈V, k≦K_b β≦α_t(only match used nodes);

$\forall{s \in {{V\mspace{14mu} {\sum\limits_{\underset{t \notin {L\bigcup{\{ s\}}}}{t = {{({b,j})} \in V}}}\; {\sum\limits_{k \leq K_{b}}\; \beta_{stk}}}} \geq {\alpha_{s} - \rho_{s}}}}$

(a used, non-root node must be matched to some parent (non-leaf) nodeother than itself);

∀s∈V\{var}, d≦D γ_sd=0 (only match {var} operator nodes to data points);

∀i≦C_({var})Σ_(d≦D) γ_({(var,i)d})≧α_((var,i)) (each {var} operator ismatched to at least one data component if used);

∀d≦D Σ_(i≦C) _({var}) γ_({(var,i)d})≧1 (each observation point isattached to at least one {var} node);

∀s, t∈V σ_(st)≦1−α_(s) (root nodes cannot be child nodes);

∀s, t∈L σ_(st)=0 (leaf nodes cannot be in a parent-child relationshipwith one another);

$\forall{s \in {{V\mspace{14mu} {\sum\limits_{\underset{t \neq s}{t \in V}}\; \sigma_{ts}}} \geq \rho_{s}}}$

(the root node is a parent node);

∀i≦C_({var})Σ_(t∈V\L) σ_({(var,i),t})=α_(({var},i)) (every used {var}node is a child of a non-leaf node);

∀s∈V\L,t=(b,j)∈V, k≦K_(b) β_(stk)≦σ_(St) and ∀s∈V\L, t=(b,j)∈V Σ_(k≦K)_(b) β_(stk)≧σ_(st) (projection of β variables on σ variables);

∀s∈V r_(s)≦(1−ρ_(s))|V| (the rank of the root node is zero);

∀s∈V r_(s)≦α_(s)|V| (the rank of unused nodes is zero); and

∀s, t∈V r_(s)+1−(1−σ_(st))(|V|+1)≦r_(t)≦r_(s)+1+(1−σ_(st))(|V|+1) (if tis the parent of s, then its rank is the rank of s plus 1).

Mixed constraints include features of numerical constraints andstructural constraints. For example:

∀s∈V, t(b,j)∈V, k≦K_(b), h∈H β_(stk)(v_(sh)−w_(tkh))=0 (numerical valuesof nodes stored in v and w variables are equal if there is aparent-child relationship in the respective indices); and

∀d≦D, h∈H,i≦C_({var})γ_({({var},i)d})(v_(({var},i),h)−χ_(d) ^(h))=0(numerical values of {var} nodes stored in v are equal to observationpoints if the {var} node is assigned to the corresponding coordinate.

Some constraints are implied by the structure of a tree and mathematicalexpressions. For example, no node can be its own parent, and nosub-expression of the form x_(d)−x_(d) can ever appear in g.

Solving the objective function (min Σ_(s∈v) α_(s)+δ) subject to one ormore constraints produces a mathematical expression ƒ that represents afit to the observational data H. This mathematical program representsmodel discovery, and finding the globally optimal solution produces thediscovered model. This solution can be discovered using conventionalsolver software, either off-the-shelf or specially designed for thispurpose.

Referring now to FIG. 2, a method for discovering a model is shown.Block 202 collects a set of measurements, including whatever inputs wereused to generate the measurements. Block 204 Formulates the mathematicalprogram as described above, establishing a set of parameters and anobjective function. Block 206 formulates the constraints on theobjective function according to one or more structural, numerical,mixed, and/or implied constraints. Block 208 solves the mathematicalprogram in view of the constraints. This solution can be performedusing, e.g., an off-the-shelf solver.

The present invention may be a system, a method, and/or a computerprogram product. The computer program product may include a computerreadable storage medium (or media) having computer readable programinstructions thereon for causing a processor to carry out aspects of thepresent invention.

The computer readable storage medium can be a tangible device that canretain and store instructions for use by an instruction executiondevice. The computer readable storage medium may be, for example, but isnot limited to, an electronic storage device, a magnetic storage device,an optical storage device, an electromagnetic storage device, asemiconductor storage device, or any suitable combination of theforegoing. A non-exhaustive list of more specific examples of thecomputer readable storage medium includes the following: a portablecomputer diskette, a hard disk, a random access memory (RAM), aread-only memory (ROM), an erasable programmable read-only memory (EPROMor Flash memory), a static random access memory (SRAM), a portablecompact disc read-only memory (CD-ROM), a digital versatile disk (DVD),a memory stick, a floppy disk, a mechanically encoded device such aspunch-cards or raised structures in a groove having instructionsrecorded thereon, and any suitable combination of the foregoing. Acomputer readable storage medium, as used herein, is not to be construedas being transitory signals per se, such as radio waves or other freelypropagating electromagnetic waves, electromagnetic waves propagatingthrough a waveguide or other transmission media (e.g., light pulsespassing through a fiber-optic cable), or electrical signals transmittedthrough a wire.

Computer readable program instructions described herein can bedownloaded to respective computing/processing devices from a computerreadable storage medium or to an external computer or external storagedevice via a network, for example, the Internet, a local area network, awide area network and/or a wireless network. The network may comprisecopper transmission cables, optical transmission fibers, wirelesstransmission, routers, firewalls, switches, gateway computers and/oredge servers. A network adapter card or network interface in eachcomputing/processing device receives computer readable programinstructions from the network and forwards the computer readable programinstructions for storage in a computer readable storage medium withinthe respective computing/processing device.

Computer readable program instructions for carrying out operations ofthe present invention may be assembler instructions,instruction-set-architecture (ISA) instructions, machine instructions,machine dependent instructions, microcode, firmware instructions,state-setting data, or either source code or object code written in anycombination of one or more programming languages, including an objectoriented programming language such as Smalltalk, C++ or the like, andconventional procedural programming languages, such as the “C”programming language or similar programming languages. The computerreadable program instructions may execute entirely on the user'scomputer, partly on the user's computer, as a stand-alone softwarepackage, partly on the user's computer and partly on a remote computeror entirely on the remote computer or server. In the latter scenario,the remote computer may be connected to the user's computer through anytype of network, including a local area network (LAN) or a wide areanetwork (WAN), or the connection may be made to an external computer(for example, through the Internet using an Internet Service Provider).In some embodiments, electronic circuitry including, for example,programmable logic circuitry, field-programmable gate arrays (FPGA), orprogrammable logic arrays (PLA) may execute the computer readableprogram instructions by utilizing state information of the computerreadable program instructions to personalize the electronic circuitry,in order to perform aspects of the present invention.

Aspects of the present invention are described herein with reference toflowchart illustrations and/or block diagrams of methods, apparatus(systems), and computer program products according to embodiments of theinvention. It will be understood that each block of the flowchartillustrations and/or block diagrams, and combinations of blocks in theflowchart illustrations and/or block diagrams, can be implemented bycomputer readable program instructions.

These computer readable program instructions may be provided to aprocessor of a general purpose computer, special purpose computer, orother programmable data processing apparatus to produce a machine, suchthat the instructions, which execute via the processor of the computeror other programmable data processing apparatus, create means forimplementing the functions/acts specified in the flowchart and/or blockdiagram block or blocks. These computer readable program instructionsmay also be stored in a computer readable storage medium that can directa computer, a programmable data processing apparatus, and/or otherdevices to function in a particular manner, such that the computerreadable storage medium having instructions stored therein comprises anarticle of manufacture including instructions which implement aspects ofthe function/act specified in the flowchart and/or block diagram blockor blocks.

The computer readable program instructions may also be loaded onto acomputer, other programmable data processing apparatus, or other deviceto cause a series of operational steps to be performed on the computer,other programmable apparatus or other device to produce a computerimplemented process, such that the instructions which execute on thecomputer, other programmable apparatus, or other device implement thefunctions/acts specified in the flowchart and/or block diagram block orblocks.

The flowchart and block diagrams in the Figures illustrate thearchitecture, functionality, and operation of possible implementationsof systems, methods, and computer program products according to variousembodiments of the present invention. In this regard, each block in theflowchart or block diagrams may represent a module, segment, or portionof instructions, which comprises one or more executable instructions forimplementing the specified logical function(s). In some alternativeimplementations, the functions noted in the block may occur out of theorder noted in the figures. For example, two blocks shown in successionmay, in fact, be executed substantially concurrently, or the blocks maysometimes be executed in the reverse order, depending upon thefunctionality involved. It will also be noted that each block of theblock diagrams and/or flowchart illustration, and combinations of blocksin the block diagrams and/or flowchart illustration, can be implementedby special purpose hardware-based systems that perform the specifiedfunctions or acts or carry out combinations of special purpose hardwareand computer instructions.

Reference in the specification to “one embodiment” or “an embodiment” ofthe present principles, as well as other variations thereof, means thata particular feature, structure, characteristic, and so forth describedin connection with the embodiment is included in at least one embodimentof the present principles. Thus, the appearances of the phrase “in oneembodiment” or “in an embodiment”, as well any other variations,appearing in various places throughout the specification are notnecessarily all referring to the same embodiment.

It is to be appreciated that the use of any of the following “/”,“and/or”, and “at least one of”, for example, in the cases of “A/B”, “Aand/or B” and “at least one of A and B”, is intended to encompass theselection of the first listed option (A) only, or the selection of thesecond listed option (B) only, or the selection of both options (A andB). As a further example, in the cases of “A, B, and/or C” and “at leastone of A, B, and C”, such phrasing is intended to encompass theselection of the first listed option (A) only, or the selection of thesecond listed option (B) only, or the selection of the third listedoption (C) only, or the selection of the first and the second listedoptions (A and B) only, or the selection of the first and third listedoptions (A and C) only, or the selection of the second and third listedoptions (B and C) only, or the selection of all three options (A and Band C). This may be extended, as readily apparent by one of ordinaryskill in this and related arts, for as many items listed.

Referring now to FIG. 3, a system 300 for model discovery is shown. Thesystem 300 includes a hardware processor 302 in communication withmemory 304. The memory 304 stores a set of observational data points306. A program generation module 308 generates a mathematical programthat represents parameters for finding a model to fit the collected data306. A constraint module 310 includes or creates a set of constraintsappropriate to the mathematical program. A solver 312 executes themathematical program, optimizing the objective function in view of theconstraints to find the model ƒ that best fits the collected data 306.

Having described preferred embodiments of a system and method forautomated model discovery from limited data (which are intended to beillustrative and not limiting), it is noted that modifications andvariations can be made by persons skilled in the art in light of theabove teachings. It is therefore to be understood that changes may bemade in the particular embodiments disclosed which are within the scopeof the invention as outlined by the appended claims. Having thusdescribed aspects of the invention, with the details and particularityrequired by the patent laws, what is claimed and desired protected byLetters Patent is set forth in the appended claims.

1. A method for model discovery, comprising: forming a mathematicalprogram based on a set of observational data to generate an objectivefunction and one or more constraints, wherein the mathematical programrepresents a model space as an expression tree comprising operators andoperands; and solving the mathematical program by optimizing theobjective function subject to the one or more constraints to determine amodel in the model space that best fits the set of observational data.2. The method of claim 1, wherein the operators in the model spaceinclude variables, coefficients, and mathematical operations.
 3. Themethod of claim 1, wherein the constraints include numerical constraintsand structural constraints, represented by both continuous and integervariables.
 4. The method of claim 1, wherein the objective function isbased on model complexity and on a fidelity of model predictions toobservations.
 5. The method of claim 4, wherein a measure of modelcomplexity comprises a respective maximum number for each operator thatrepresents the maximum number of times that the operator can appear inthe model.
 6. The method of claim 4, wherein a measure of modelcomplexity comprises a maximum number of operators in the model.
 7. Themethod of claim 4, wherein optimizing the objective function comprisesminimizing the function Σ_(s∈V) α_(s)+δ, wherein s is a node in the setof possible nodes V, α_(s) is a binary parameter that represents whetherthe node s is present in the expression tree, and δ is a measure ofmodel fidelity.
 8. A computer readable storage medium comprising acomputer readable program for model discovery, wherein the computerreadable program when executed on a computer causes the computer toperform the steps of claim
 1. 9. A method for model discovery,comprising: forming a mathematical program based on a set ofobservational data to generate an objective function and at least onenumerical and structural constraint represented by continuous andinteger variables, wherein the mathematical program represents a modelspace as an expression tree comprising operators and operands andwherein the objective function is based on model complexity and on afidelity of model predictions to observations; and solving themathematical program by optimizing the objective function subject to theone or more constraints to determine a model in the model space thatbest fits the set of observational data.
 10. The method of claim 9,wherein the operators in the model space include variables,coefficients, and mathematical operations.
 11. The method of claim 9,wherein a measure of model complexity comprises a respective maximumnumber for each operator that represents the maximum number of timesthat the operator can appear in the model.
 12. The method of claim 9,wherein a measure of model complexity comprises a maximum number ofoperators in the model.
 13. The method of claim 9, wherein optimizingthe objective function comprises minimizing the function Σ_(s∈V)α_(s)+δ, wherein s is a node in the set of possible nodes V, α_(s) is abinary parameter that represents whether the node s is present in theexpression tree, and δ is a measure of model fidelity.
 14. A system formodel discovery, comprising: a mathematical program generation modulecomprising a processor configured to form a mathematical program basedon a set of observational data to generate an objective function and oneor more constraints, wherein the mathematical program represents a modelspace as a an expression tree comprising operators and operands; and asolver configured to solve the mathematical program by optimizing theobjective function subject to the one or more constraints to determine amodel in the model space that best fits the set of observational data.15. The system of claim 14, wherein the operators in the model spaceinclude variables, coefficients, and mathematical operations.
 16. Thesystem of claim 14, wherein the constraints include numericalconstraints and structural constraints, represented by both continuousand integer variables.
 17. The system of claim 14, wherein the objectivefunction is based on model complexity and on a fidelity of modelpredictions to observations.
 18. The system of claim 17, wherein ameasure of model complexity comprises a respective maximum number foreach operator that represents the maximum number of times that theoperator can appear in the model.
 19. The system of claim 17, wherein ameasure of model complexity comprises a maximum number of operators inthe model.
 20. The system of claim 17, wherein optimizing the objectivefunction comprises minimizing the function Σ_(s∈V) α_(s)+δ, wherein s isa node in the set of possible nodes V, α_(s) is a binary parameter thatrepresents whether the node s is present in the expression tree, and δis a measure of model fidelity.